Complexity of Existential Positive First-Order Logic 



Manuel Bodirsky, Miki Hermann 
LIX (UMR CNRS 7161), Ecole Polytechnique, 91 128 Palaiseau, France 
{bodirsky, nermann} @ lix.polytechnique.fr 

Florian Richoux* 
JFLI, CNRS - University of Tokyo, Japan 
richoux@jfli.itc.u-tokyo.ac.jp 



Abstract 

Let r be a (not necessarily finite) structure with a finite relational signature. We prove that decid- 
ing whether a given existential positive sentence holds in T is in LOGS PACE or complete for the class 
CSP(T)np under deterministic polynomial-time many-one reductions. Here, CSP(T)np is the class 
of problems that can be reduced to the constraint satisfaction problem of T under non- deterministic 
polynomial-time many-one reductions. 
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1 Introduction 

We study the computational complexity of the following class of computational problems. Let T be a 
structure with finite or infinite domain and with a finite relational signature. The model-checking problem 
for existential positive first-order logic, parametrized by T, is the following problem. 

Problem: ExPos(r) 

Input: An existential positive first-order sentence 
Question: Does T satisfy <£? 

Existential positive first-order formula over T are first-order formulas without universal quantifiers, equali- 
ties, and negation symbols, and formally defined as follows: 

- if R is a relation symbol of a relation from T with arity k and x\, . . . , x^ are (not necessarily distinct) vari- 
ables, then R(x\, . . . , Xk) is an existential positive first-order formula (such formulas are called atomic); 

- if <p and tp are existential positive first-order formulas, then ip A "0 and tp V ip are existential positive 
first-order formulas; 

- if 99 is an existential positive first-order formula with a free variable x then 3x.(p is an existential positive 
first-order formula. 

"This work was done during the PhD studies of the third author at Ecole Polytechnique. 
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An existential positive first-order sentence is an existential positive first-order formula without free variables. 

Note that we do not allow the equality symbol in the existential positive sentences; this only makes our 
results stronger, since one might always add a relation symbol = for the equality relation to the signature 
of r to obtain the result for the case where the equality symbol is allowed. Also note that adding a symbol 
for equality to F might change the complexity of ExPos(r). Consider for example F := (N;^); here, 
ExPos(r) can be reduced to the Boolean formula evaluation problem (which is known to be in LOGS PACE) 
as follows: atomic formulas in $ of the form x ^ y are replaced by true, and atomic formulas of the form 
x 7^ x are replaced by false. The resulting Boolean formula is equivalent to true if and only if is true in F. 
However, the problem ExPos(r') for F' := (N; 7^, =) is NP-complete. Similar examples exist over finite 
domains. 

The constraint satisfaction problem CSP(T) for F is defined similarly, but its input consists of a prim- 
itive positive sentence, that is, a existential positive sentence without disjunctions. Constraint satisfaction 
problems frequently appear in many areas of computer science, and have attracted a lot of attention, in 
particular in combinatorics, artificial intelligence, finite model theory and universal algebra; we refer to the 
recent collection of survey articles on this subject [T). The class of constraint satisfaction problems for in- 
finite structures F is a rich class of problems; it can be shown that for every computational problem there 
exists a relational structure F such that CSP(T) is equivalent to that problem under polynomial-time Turing 
reductions |2). 

In this paper, we show that the complexity classification for existential positive first-order sentences over 
infinite structures can be reduced to the complexity classification for constraint satisfaction problems. For 
finite structures F, our result implies that ExPos(r) is in LOGS PACE or NP-complete. The LogSpace- 
solvable cases of ExPos(r) are in this case precisely those relational structures F with an element a such 
that all non-empty relations in F contain the tuple (o, . . . ,0); in this case, ExPos(r) is called a-valid. 
Interestingly, this is no longer true for infinite structures F. To see this, consider again the structure F := 
(N; 7^), which is clearly not a-valid, but in LogSpace as we have noticed above. 

A universal-algebraic study of the model-checking problem for finite structures F and various other 
syntactic restrictions of first-order logic (for instance positive first-order logic) can be found in [9]. 

A preliminary version of this article appeared in Q- The present version differs in that the main proof 
has been simplified and now also works without the relation symbol for equality; moreover, Proposition [3] 
and Section @] have been added. 

2 Main Result 

We write L < m L' if there exists a deterministic polynomial-time many-one reduction from L to L'. 

Definition 1 (from [6|) A problem A is non-deterministic polynomial-time many-one reducible to a prob- 
lem B (A <np B) if there is a nondeterministic polynomial-time Turing machine M such that x E A if 
and only if there exists a computation of M that outputs y on input x, and y € B. We denote by Anp the 
smallest class that contains A and is downward closed under <np- 

Observe that < NP is transitive [13. To state the complexity classification for existential positive first- 
order logic, we need the following concept. The F-localizer F(ip) of a formula tp is defined as follows: 

• F(3x.ip) = F(ip) 

• F(ipAip) = F(<p)AF(ip) 

• F(tp V ip) = F((p) V F(i/)) 
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I true if ip is satisfiable in T 
• When ip is atomic, then F(ip) = < 

I false otherwise 

Definition 2 We call a structure T locally refutable if every existential positive sentence $ is true in T if 
and only if the T-localizer -F(3>) is logically equivalent to true. 

Proposition 3 A structure T is locally refutable if and only if every unsatisfiable conjunction of atomic 
formulas contains an unsatisfiable conjunct. 

Proof: First suppose that T is locally refutable, and let ip be a conjunction of atomic formulas with variables 
x\, . . . , x n . Then every conjunct of ip is satisfiable in T if and only if F(tp) is true. By local refutability of 
T this is the case if and only if Ebi, . . . , x n .<p is true in T, which shows the claim. 

Now suppose that T is not locally refutable, that is, there is an existential positive sentence <E> that is false 
in T such that F($>) is true. Define recursively for each subformula ip of where F{ip) is true the formula 
T(ip) as follows. If ip is of the form ipi V tp2, then for some i € {1, 2} the formula F{ip{) must be true, and 
we set T(ip) to be T(ipi). If ip is of the form ip\ A ip2, then for both i € {1, 2} the formula F(tpi) must be 
true, and we set to be T(V>i) A T(ip 2 )- 

Each conjunct ip in T($) is satisfiable in T since F(Q) is true. But since is false in T, T($) must be 
unsatisfiable. □ 

In Section [3l we will show the following result. 

Theorem 4 Let T be a structure with a finite relational signature r. IfT is locally refutable then the problem 
ExPos(T) to decide whether an existential positive sentence is true in T is in LOGSPACE. IfT is not locally 
refutable, then ExPos(T) is complete for the class CSP(T)np under polynomial-time many-one reductions. 

In particular, ExPos(T) is in LOGSPACE or is NP-hard (under deterministic polynomial-time many- 
one reductions). If T is finite, then ExPos(T) is in LOGS PACE or NP-complete, because finite domain 
constraint satisfaction problems are clearly in NP. The observation that ExPos(T) is in LOGS PACE or 
NP-complete has previously been made in and independently in (81. However, our proof remains the 
same for finite domains and is simpler than the previous proofs. 

3 Proof 

Before we prove Theorem 01 we start with the following simpler result. 

Theorem 5 Let T be a structure with a finite relational signature r. If T is locally refutable, then the 
problem ExPos(T) to decide whether an existential positive sentence is true in T is in LOGSPACE. IfT is 
not locally refutable, then ExPOS(T) is NP-hard (under polynomial-time many-one reductions). 

To prove Theorem [2 we need first to prove the following lemma. 

Lemma 6 A structure T is not locally refutable if and only if there are existential positive formulas ipQ 
and ipi with the property that 

- tpQ and ip\ define non-empty relations over T; 

- ipo A ipi defines the empty relation over T. 
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Proof: The "if'-part of the statement is immediate. To show the "only if'-part, suppose that F is not locally 
refutable. Then by Proposition [3] there is an unsatisfiable conjunction ip of satisfiable atomic formulas. 
Among all such formulas ip, let ip be one of minimal length. Let ipQ be one of the atomic formulas in tp, and 
let tpi be the conjunction over the remaining conjuncts in ip. Since ip was chosen to be minimal, the formula 
ipi must be satisfiable. By construction ipo is also satisfiable and ip is unsatisfiable, which is what we had to 
show. □ 

Proof of Theorem [5} If F is locally refutable, then ExPos(r) can be reduced to the positive Boolean 
formula evaluation problem, which is known to be LOGSPACE-complete. We only have to construct from 
an existential positive sentence $ a Boolean formula F := Fr($) as described before Definition |2] Clearly, 
this construction can be performed with logarithmic work-space. We evaluate F, and reject if F is false, 
and accept otherwise. 

If T is not locally refutable, we show NP-hardness of ExPos(T) by reduction from 3-SAT. Let I be a 
3-SAT instance. We construct an instance $ of ExPos(T) as follows. Let ipQ and ip% be the formulas from 
Lemma[6](suppose they are d-ary). Let v\, . . . , v n be the Boolean variables in I. For each Vi we introduce d 
new variables Xj = xj, . . . , xf. Let be the instance of ExPos(T) that contains the following conjuncts: 

• For each 1 < i < n, the formula ipo(xi) V ip\{xi) 

• For each clause l\ V I2 V I3 in I, the formula ip^ (xj l ) V tpi 2 (aij 2 ) V ipi 3 (xj 3 ) where i p = if l p equals 
-<Xj and i p = 1 if l p equals Xj p , for all p E {1,2, 3}. 

It is clear that can be computed in deterministic polynomial time from /, and that <3? is true in T if and 
only if / is satisfiable. □ 

Applied to finite relational structures T, we obtain the result from [5] and |8], that is, ExPos(T) is in 
LOGS PACE if T is a- valid and NP-complete otherwise. We prove in the following proposition that, over a 
finite domain D, T is locally refutable if and only if it is a-valid for an element a € D. 

Proposition 7 Let V be a relational structure with a finite domain D. Then V is locally refutable if and only 
if it is a-valid for an element a € D. 

Proof: Suppose that V is a-valid, and let $ be an existential positive sentence over the signature of T. 
To show that T is locally refutable, we only have to show that $ is true in F when F(Q) is equivalent 
to true (since the other direction holds trivially). But this follows from the fact that if an atomic formula 
R(x±, . . . , x n ) is satisfiable in T then in fact this formula can be satisfied by setting all variables to a. 

For the opposite direction of the statement, let D = {a\, . . . , a n }, and suppose that for all a G D the 
structure F is not a-valid. That is, for each a, € D there exists a non-empty relation Ri of arity n in V such 
that (ai, . . . , aj) £ R. Let r be Ya=i r «> anc ^ let xi, . . . , x rn be distinct variables. Consider the formula 

ip = /\ Ri(yi,--- ,Vn) A--- ARn(y r -r n+ i,...,y r ) (1) 

y€{xi,...,x rn } r 

By the pigeonhole principle, for every mapping / : {x\, . . . , x rn } — > D at least r variables are mapped to 
the same value, say to a^. For a vector y that contains exactly these r variables, for some I there is a conjunct 
Ri(yi + i, . . . , yi+n) in ip; but by assumption, Ri does not contain the tuple (aj, . . . , ai). This shows that 
3xi, . . . , x rn .ip is not true in T. On the other hand, since each relation Ri is non-empty, it is clear that the 
Boolean formula F(3x\, . . . , x rn .ip) is true. Therefore, F is not locally refutable. □ 
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Remark 8 In the proof of Theorem |4] it will be convenient to assume that T has a single relation R. When 
we study the problem CSP(T), this is without loss of generality, since we can always find a CSP which is 
deterministic polynomial-time equivalent and where the template is of this form: if T = (D; R\, . . . , R n ) 
where Ri has arity rj and is not empty, then CSP(r) is equivalent to CSP(-D;i?i x • • • x R n ) where 
R\ X • • • X R n is the Ylt=l r i' ar y relation defined as the Cartesian product of the relations R\, . . . , R n . 
Similarly, ExPos(r) is equivalent to ExPos(£>; Ri x • • • x R n ). 

Proof of Theorem^ If T is locally refutable then the statement has been shown in Theorem [5] Suppose 
that T is not locally refutable. To show that ExPos(r) is contained in CSP(F)np, we construct a non- 
deterministic Turing machine T which takes as input an instance <3? of ExPos(r), and which outputs an 
instance T($) of CSP(r) as follows. 

On input <l> the machine T proceeds recursively as follows: 

• if $ is of the form 3x.ip then return 3x.T(ip); 

• if $ is of the form ipi A (f2 then return T(ip\) A T((^ 2 ); 

• if $ is of the form ipi V <p2 then non-deterministically return either T(ipi) or T(ip2); 

• if $ is of the form R(x\, . . . , x^) then return R(x\, . . . , xjS). 

The output of T can be viewed as an instance of CSP(r), since it can be transformed to a primitive positive 
sentence (by moving all existential quantifiers to the front). It is clear that T has polynomial running time, 
and that <1> is true in Y if and only if there exists a computation of T on that computes a sentence that is 
true in T. 

We now show that ExPos(r) is hard for CSP(P)np under < m -reductions. Let L be a problem with 
a non-deterministic polynomial-time many-one reduction to CSP(r), and let M be the non-deterministic 
Turing machine that computes the reduction. We have to construct a deterministic Turing machine M ' that 
computes for any input string s in polynomial time in \s\ an instance of ExPos(r) such that $ is true 
in T if and only if there exists a computation of M on s that computes a satisfiable instance of CSP(r). 

Say that the running time of M on s is in 0(|s| e ) for a constant e. Hence, there are constants so an(1 c 
such that for \s\ > s$ the running time of M and hence also the number of constraints in the input instance of 
CSP(r) produced by the reduction is bounded by t := c|s| e . The non-deterministic computation of M can 
be viewed as a deterministic computation with access to non-deterministic advice bits as shown in [4J. We 
also know that for \s\ > sq, the machine M can access at most t non-deterministic bits. If w is a sufficiently 
long bit-string, we write M w for the deterministic Turing machine obtained from M by using the bits in w 
as the non-deterministic bits, and M w (s) for the instance of CSP(r) computed by M w on input s. 

If \s\ < so, then M' returns 3x.tpi(x) if there is an w G {0, 1}* such that M w {s) is a satisfiable instance 
of CSP(r), and M' returns 3x(i/;q(x) A ipi(x)) otherwise (i.e., it returns a false instance of ExPos(r); i/jq 
and ipi are defined in Lemma [6]). Since sq is a fixed finite value, M' can perform these computations in 
constant time. 

By Remark[8]made above, we can assume without loss of generality that T has just a single relation R. 
Let I be the arity of R. Then instances of CSP(r) with variables be encoded as sequences 

of numbers that are represented by binary strings of length [logt] as follows: the i-th number m in this 
sequence indicates that the (((z — 1) mod I) + l)-st variable in the (((i — 1) div I) + l)-st constraint is x m . 

For |s| > sq, we use a construction from the proof of Cook's theorem given in |4j. In this proof, 
a computation of a non-deterministic Turing machine T accepting a language L is encoded by Boolean 
variables that represent the state and the position of the read-write head of T at time r, and the content of 
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the tape at position j at time r. The tape content at time consists of the input x, written at positions 1 
through n, and the non-deterministic advice bit string w, written at positions —1 through — \w\. The proof 
in B specifies a deterministic polynomial-time computable transformation fi that computes for a given 
string s a SAT instance such that there is an accepting computation of T on s if and only if there is a 

satisfying truth assignment for /l(s). 

In our case, the machine M computes a reduction and thus computes an output string. Recall our binary 
representation of instances of the CSP M writes on the output tape a sequence of numbers represented by 
binary strings of length [log t] . It is straightforward to modify the transformation /l given in the proof of 
Theorem 2.1 in lU to obtain for all positive integers a, b, c where a < t, b < I, c < |~logt|, and d G {0, 1}, 
a deterministic polynomial-time transformation g d ahc that computes for a given string s a SAT instance 
9a b c( s ) w i tn distinguished variables z\, . . . , z p , p < t for the non-deterministic bits in the computation of 
M such that the following are equivalent: 

• 9a b c( s ) nas a satisfying assignment where z% is set to Wi G {0, 1} for 1 < i < p; 

• the c-th bit in the 6-th variable of the a-th constraint in M w (s) equals d. 

We use the transformations g d a h to define M' as follows. The machine M' first computes the formulas 
9abc( s )- F° r every Boolean variable v in these formulas we introduce a new conjunct ipo(x v ) V tpi(x v ) 
where x v is a ci-tuple of fresh variables and ipo and ipi are the two formulas defined in Lemma [6l Then, 
every positive literal v in the original conjuncts of the formula is replaced by (x v ), and every negative 
literal I = -\v by ipo(x v ). We then existentially quantify over all variables except for x Zl , . . . ,x z . Let 
ip d b c (s) denote the resulting existential positive formula. For positive integers k and i, we denote as k[i] the 
z-th bit in the binary representation of k. Let n be the total number of variables in the CSP instance M w {s) 
(in particular, n < t). It is clear that the formula 



can be re- written in existential positive form without blow-up: we can replace implications a — > (3 by 
->a V /?, and then move the negation to the atomic level, where we can remove negation by exchanging the 
role of ipo and ipi. Hence, can be computed by M' in polynomial time. 

We claim that the formula $ is true in T if and only if there exists a computation of M on s that computes 
a satisfiable instance of CSP(T). To see this, let wbea sufficiently long bit-string such that M w {s) is a 
satisfiable instance of CSP(T). Suppose for the sake of notation that the n variables in M w (s) are the 
variables yi, . . . ,y n . Let eti, . . . , a n be a satisfying assignment to those n variables. Then, if for 1 < i < n 
the variable yi in the formula $ is set to a^, and for 1 < i < p the variables x Zi are set to a tuple that satisfies 
ipd where d is the z-th bit in w, we claim that the inner part of <3? is true in T. The reason is that, due to 

the way how we set the variables of the form x Zi , the precondition ^A&<« c ^a b ;Ic( s )) ^ s true ^ anc ^ OIU y ^ 
it^yfcj , . . . , y^ ) is a constraint in M w (s). Therefore, all the atomic formulas of the form , . . . , x^) are 
satisfied due to the way how we set the variables yi, and hence $ is true in T. It is straightforward to verify 
that the opposite implication holds as well, and this shows the claimed equivalence. □ 
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4 Structures With Function Symbols 



In this section, we briefly discuss the complexity of ExPos(r) when T might also contain functions. That 
is, we assume that the signature of T consists of a finite set of relation and function symbols, and that the 
input formulas for the problem ExPos(r) are existential positive first-order formulas over this signature. It 
is easy to see from the proofs in the previous section that when T is not locally refutable, then ExPos(r) is 
still NP-hard (with the same definition of local refutability as before). 

The case when T is locally refutable becomes more intricate when F has functions. We present an 
example of a locally refutable structure T where ExPos(T) is NP-hard. Let the signature of T be the 
structure (2 N ; 7^, Pi, U, c, 0, 1) where ^ is the binary disequality relation, n and U are binary functions for 
intersection and union, respectively, c is a unary function for complementation, and 0, 1 are constants (i.e., 
0-ary functions) for the empty set and the full set N, respectively. 

Proposition 9 The structure (2 N ; 7^, D, U, c, 0, 1) is locally refutable. 

Proof: By Lemma[6]is suffices to show that if ^ is a conjunction of atomic formulas that are satisfiable in T, 
then is satisfiable over T. Since the only relation symbol in the structure is 7^, every conjunct in ^ is of 
the form t\ 7^ t2, where t\ and t-i are terms formed by variables and the function symbols n, U, c, 1 and 0. 
By Boole's fundamental theorem of Boolean algebras, t = t' can be re-written as t" = 0. Therefore, ^ can 
be written as t\ 7^ A • • • A t n 7^ 0. Since T is an infinite Boolean algebra, Theorem 5.1 in Q shows that 
if ti 7^ is satisfiable in V for all i < n, then \& is satisfiable in T as well. □ 

Proposition 10 The problem ExPos(2 N ; 7^, n, U, c, 0, 1) is NP-hard. 

Proof: The proof is by reduction from SAT. Given a Boolean formula ^ in CNF with variables xx, . . . , x n , 
we replace each conjunction in ^ by n, each disjunction by U, and each negation by c. Let t be the resulting 
term over the signature {n, U, c} and variables x\, . . . , x n . It is easy to verify that Ebi, . . . , x n .t 7^ is true 
in T if and only if ^ is a satisfiable Boolean formula. □ 

5 Conclusion 

In this paper, we proved that for an arbitrary (finite or infinite) relational structure the problem ExPos(T) 
is in LogSpace if T is locally refutable, or otherwise complete for the class CSP(T)np under deterministic 
polynomial-time many-one reductions. In particular, if T is not locally refutable then the problem ExPos (T) 
is NP-hard. Structures with a finite domain are locally refutable if and only if they are a-valid for some 
value a of the domain D. Finally, we present an example of a structure that shows that our result cannot 
be straightforwardly extended to structures T with function symbols, since local refutability of T no longer 
implies that ExPos(T) is in LOGS PACE when T contains function symbols. 
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